306 research outputs found

    Towards trustworthy phoneme boundary detection with autoregressive model and improved evaluation metric

    Full text link
    Phoneme boundary detection has been studied due to its central role in various speech applications. In this work, we point out that this task needs to be addressed not only by algorithmic way, but also by evaluation metric. To this end, we first propose a state-of-the-art phoneme boundary detector that operates in an autoregressive manner, dubbed SuperSeg. Experiments on the TIMIT and Buckeye corpora demonstrates that SuperSeg identifies phoneme boundaries with significant margin compared to existing models. Furthermore, we note that there is a limitation on the popular evaluation metric, R-value, and propose new evaluation metrics that prevent each boundary from contributing to evaluation multiple times. The proposed metrics reveal the weaknesses of non-autoregressive baselines and establishes a reliable criterion that suits for evaluating phoneme boundary detection.Comment: 5 pages, submitted to ICASSP 202

    Differentiable Artificial Reverberation

    Full text link
    Artificial reverberation (AR) models play a central role in various audio applications. Therefore, estimating the AR model parameters (ARPs) of a target reverberation is a crucial task. Although a few recent deep-learning-based approaches have shown promising performance, their non-end-to-end training scheme prevents them from fully exploiting the potential of deep neural networks. This motivates to introduce differentiable artificial reverberation (DAR) models which allows loss gradients to be back-propagated end-to-end. However, implementing the AR models with their difference equations "as is" in the deep-learning framework severely bottlenecks the training speed when executed with a parallel processor like GPU due to their infinite impulse response (IIR) components. We tackle this problem by replacing the IIR filters with finite impulse response (FIR) approximations with the frequency-sampling method (FSM). Using the FSM, we implement three DAR models -- differentiable Filtered Velvet Noise (FVN), Advanced Filtered Velvet Noise (AFVN), and Feedback Delay Network (FDN). For each AR model, we train its ARP estimation networks for analysis-synthesis (RIR-to-ARP) and blind estimation (reverberant-speech-to-ARP) task in an end-to-end manner with its DAR model counterpart. Experiment results show that the proposed method achieves consistent performance improvement over the non-end-to-end approaches in both objective metrics and subjective listening test results.Comment: Manuscript submitted to TASL

    Generation of Non-uniform Meshes for Finite-Difference Time-Domain Simulations

    Get PDF
    Abstract -In this paper, two automatic mesh generation algorithms are presented. The methods seek to optimize mesh density with regard to geometries exhibiting both fine and coarse physical structures. When generating meshes, the algorithms attempt to satisfy the conditions on the maximum mesh spacing and the maximum grading ratio simultaneously. Both algorithms successfully produce non-uniform meshes that satisfy the requirements for finite-difference time-domain simulations of microwave components. Additionally, an algorithm successfully generates a minimum number of grid points while maintaining the simulation accuracy

    High-efficiency Bidirectional Buck-Boost Converter for Residential Energy Storage System

    Get PDF
    This paper proposes a bidirectional dc-dc converter for residential micro-grid applications. The proposed converter can operate over an input voltage range that overlaps the output voltage range. This converter uses two snubber capacitors to reduce the switch turn-off losses, a dc-blocking capacitor to reduce the input/output filter size, and a 1:1 transformer to reduce core loss. The windings of the transformer are connected in parallel and in reverse-coupled configuration to suppress magnetic flux swing in the core. Zero-voltage turn-on of the switch is achieved by operating the converter in discontinuous conduction mode. The experimental converter was designed to operate at a switching frequency of 40-210 kHz, an input voltage of 48 V, an output voltage of 36-60 V, and an output power of 50-500 W. The power conversion efficiency for boost conversion to 60 V was >= 98.3% in the entire power range. The efficiency for buck conversion to 36 V was >= 98.4% in the entire power range. The output voltage ripple at full load was <3.59 V-p.p for boost conversion (60 V) and 1.35 V-p.p for buck conversion (36 V) with the reduced input/output filter. The experimental results indicate that the proposed converter is well-suited to smart-grid energy storage systems that require high efficiency, small size, and overlapping input and output voltage ranges.11Ysciescopu
    corecore